npj Genomic Medicine — Latest Matching Preprints

1

Tumor Patterns and Cancer Risk in Carriers of TP53 exonic Germline Variants that alter mRNA Splicing

Schönegger, D.; Montellier, E.; Blanchet, S.; Freycon, C.; Monti, P.; Goudie, C.; Bougeard, G.; Kratz, C. P.; Hainaut, P.; Reymer, A.

2025-09-27 oncology 10.1101/2025.09.22.25336167 medRxiv

Top 0.1%

34.7%

Show abstract

Pathogenic germline variants in the TP53 gene cause Li-Fraumeni syndrome (LFS), a highly penetrant cancer predisposition disorder. Most of these variants arise from single-nucleotide variations (SNVs) in TP53 exons, causing missense mutations. However, some of these SNVs may also alter mRNA splicing, defining spliceogenic single nucleotide variants (SE-SNVs) of uncertain clinical significance. We reassessed previously classified TP53 missense variants for spliceogenic effects using SpliceAI predictions, in vitro minigene assays, and transcriptomic data from TCGA. Genotype-phenotype correlations were evaluated using clinical data from carriers of TP53 germline variants across multiple databases and registries. Among 58 identified SE-SNVs, 40 were missense and 18 synonymous. Experimental validation showed that most induce aberrant splicing events, frequently via cryptic splice site activation, leading to frameshift and premature stop codons. Several missense variants previously classified as having mild or low pathogenicity were found instead to have strong spliceogenic effects and were associated with early-onset cancers typical of LFS, suggesting that splicing alterations may override their protein-coding impact. The frequent SNV c.375G>A leading to the synonymous variant p.T125= shows intermediate severity, likely due to partial retention of normal splicing activity. Our study highlights the underestimated pathogenic potential of SE-SNVs affecting the TP53 gene. These findings underscore the importance of integrating splicing predictions, functional assays, and transcript-level analyses into TP53 variant interpretation to improve risk stratification in LFS.

2

Estimating the prevalence of late-onset Fabry disease in the US in 2024

Cook, J.; Coker, T.; Card-Gowers, J.; Webber, L.

2024-12-14 public and global health 10.1101/2024.12.13.24319001 medRxiv

Top 0.1%

28.3%

Show abstract

Fabry disease is a rare lysosomal storage condition in which sphingolipid levels build up to harmful levels in various bodily organs, eventually leading to life-threatening complications such as stroke and kidney failure. Fabry disease is caused by rare pathogenic alleles in the GLA gene on chromosome X and may present as an early or late-onset disease depending on the identity of the causal allele and the severity of its effect on the gene product. Epidemiological studies have widely varied in their estimation of Fabry disease prevalence: estimates based on reported clinical cases range from 1 in 40,000 to 1 in 170,000 individuals, whilst recent estimates based on newborn screening are much higher, ranging from 1 in 1,250 to 1 in 21,973 individuals. The primary aim of this study was to estimate the prevalence of Fabry disease in the US in 2024 by analysing selected GLA variants mostly associated with late-onset Fabry disease, projecting their allele frequencies to the US population and applying penetrance data from the literature to calculate how many causal allele carriers would be expected to be symptomatic for the disease at some point within their lifetime. 8 causal genetic variants were selected for analysis in this study based on their inclusion in a previous Fabry disease study using data from the UK Biobank. Allele frequencies for all 8 variants in global ancestry groups were extracted from gnomAD v4.1. The size and demographic makeup of the US population in 2024 was obtained from the US Census Bureau and mapped to gnomAD v4.1 ancestry groups, using previously reported estimates of the ancestral composition of Census groups encompassing multiple ancestry groups. Carrier counts by sex and ethnic group were calculated by projecting the summed allele frequencies to the US population using the Hardy-Weinberg equation and taking into consideration the X-linked mode of inheritance, assuming each individual can only carry 1 pathogenic variant. It was found that pathogenic alleles are present in the gnomAD v4.1 sample for all variants in the non-Finnish European gnomAD ancestry group, for 2 variants in South Asian ancestry group, and for 1 variant in the African / African American and East Asian ancestry groups. For the remaining 5 ancestry groups, there are no pathogenic alleles recorded in the gnomAD v4.1 dataset across all 8 variants included for analysis in the study. Results show the highest pathogenic allele carrier frequencies in the European (non-Finnish) ancestry group, followed by the South Asian, East Asian and African / African American ancestry groups. Using reported penetrance figures of 100% for males and 70% for females, it is estimated that the carrier and symptomatic populations of Fabry disease in the US in 2024, based on analysis of the 8 included variants, are 12,024 male carriers (or 1 in 14,022 males) who will all develop symptoms, and 24,845 female carriers (or 1 in 6,978 females), of whom 17,392 will develop symptoms. Of these carriers who will develop symptoms, around 98.6% (corresponding to 11,858 men and 17,153 women) will carry a variant primarily associated with late-onset or both forms of Fabry disease. The prevalence figures presented in this study are significantly higher than those based on reported clinical cases and are in line with those presented more recently based on newborn screening studies and with the prevalence reported in the UK Biobank analysis. The US National Institute of Health reports Fabry disease prevalence at around 1 in 50,000 males (which would correspond to 1 in 25,000 females). Analysing just 8 of the potentially hundreds of causal variants within the GLA gene, this study suggests that Fabry disease may be over 3 times as prevalent as is currently believed. This work highlights the vast potential of large genetic databases to analyse rare diseases, which will continue to progress as these datasets add more data, which will improve their power and diversity. What Is Already Known On This TopicO_LIFabry Disease is a rare X-linked lysosomal storage disorder with historical prevalence estimates ranging from 1 in 40,000 to 1 in 170,000 males, based on case ascertainment. C_LIO_LIMore recent newborn screening studies that test alpha-galactosidase A activity or perform genetic testing within the GLA gene, in addition to a UK Biobank study examining the prevalence of selected causal Fabry disease variants, have consistently suggested that Fabry disease may be far more prevalent than the estimates based on case ascertainment. C_LI What This Study AddsO_LITo our knowledge, this is the first study providing population-level estimates of the number of causal Fabry disease carriers and of the symptomatic population in the US using publicly available data from gnomAD v4.1. Our estimates are consistent with those produced by newborn screening studies and the UK Biobank analysis, and suggest that late-onset Fabry disease may affect >1 in 10,000 people in the US in 2024 at some point during their lifetime. C_LIO_LIThis study also demonstrates the potential of large genetic databases, such as gnomAD, for the study of rare genetic diseases, which are often misdiagnosed and may consequently be believed to be rarer than they are in reality. C_LI How This Study May Affect ResearchO_LIThis study highlights two areas for improvement which would be significantly beneficial to the study of rare genetic diseases. {circ}While this study demonstrates the utility of genetic databases to study certain rare genetic diseases, it is likely that the study of rarer conditions, in particular those manifesting during childhood and/or with a dominant mode of inheritance, would be more difficult using genetic databases, as individuals with such conditions are less likely to be included in population-level genetic biobanks (such as UK Biobank) due to a healthy volunteer bias. It is important that future genetic datasets are more representative in their recruitment to ensure that rare genetic diseases are not systematically excluded or underrepresented among participants. Studies such as All Of Us in the US, and Our Future Health and the Generation Study in the UK, will be extremely helpful in addressing this point. {circ}Estimates of the symptomatic Fabry disease population in the US in 2024 were calculated using the most up-to-date penetrance estimates in males and females. However these estimates were calculated using individuals already present in a Fabry registry and therefore may overestimate the penetrance, and especially among females, since asymptomatic carriers may be less likely to join a disease registry. Accurate calculation of the symptomatic population with a given genetic disease relies upon accurate penetrance estimates, which are not always available. These estimates are best calculated from large population-level resources with linked genetic and electronic health record data. C_LI

3

Missense but mis-spliced: germline TP53 variant c.671A>C (p.E224A) and the path from uncertainty to pathogenicity

Velkova, I.; Cappato, S.; Rivera, D.; Romano, F.; Schonegger, D.; Bocciardi, R.; Hainaut, P.; De Marco, P.; Gismondi, V.; Cirmena, G.; Menta, L.; Ognibene, M.; Garaventa, A.; Manzitti, C.; Brugnara, S.; Ciribilli, Y.; Bisio, A.; Marcaccini, E.; Malatesta, P.; Faravelli, F.; Menichini, P.; Monti, P.; Capra, V.

2025-08-01 oncology 10.1101/2025.07.31.25332437 medRxiv

Top 0.1%

23.8%

Show abstract

The TP53 gene encodes the well-known P53 tumor suppressor protein, which plays a crucial role in preventing cancer development. Germline TP53 variants cause Li-Fraumeni Syndrome (LFS), an autosomal dominant disorder associated with early-onset cancers, including breast cancer, brain tumors, leukemias, bone cancers, and soft tissue sarcomas. Functional studies in yeast and human cells demonstrated that TP53 variants can have various effects, such as partial or complete loss of function and even gain of pro-oncogenic activities. Here, we identified a germline TP53 variant c.671A>C, resulting in the missense mutant protein p.E224A in the context of early-onset retroperitoneal rhabdomyosarcoma occurring in a child with a notable family history of cancer, suggestive of LFS. The variant was initially classified as a variant of uncertain significance (VUS). Functional assays in yeast and human cells demonstrated wild type-like activity of the protein p.E224A; however, in silico analysis predicted at RNA level a splicing defect, which we further investigated using a minigene approach. This analysis showed that the variant c.671A>C causes the skipping of exon 6, potentially introducing a frameshift in cDNA and a premature stop codon, which likely triggers nonsense-mediated mRNA decay; the loss of heterozygosity at the c.671 position in the parents TP53 transcript further confirmed the splicing impairment. In summary, these findings supported reclassifying the TP53 germline variant c.671A>C (p.E224A) from VUS to likely pathogenic, providing a definitive molecular diagnosis for family counseling. Additionally, this study sheds light on how certain TP53 variants that are defined as missense, can be linked to disease mechanisms through RNA splicing disruption, highlighting the need for their deep functional assessment.

4

Detection of Pre-Existing Immunity to Bacterial Cas9 Proteins in People with Cystic Fibrosis

Serpa, G.; Gong, Q.; De, M.; Rana, P. S. J. B.; Montgomery, C. P.; Wozniak, D. J.; Long, M. E.; Hemann, E. A.

2025-03-22 immunology 10.1101/2025.03.20.644396 medRxiv

Top 0.1%

23.5%

Show abstract

Cystic fibrosis (CF) is caused by homozygous mutations in the cystic fibrosis transmembrane conductance regulator (CFTR) gene, resulting in multi-organ dysfunction and decreased lifespan and quality of life. A durable cure for CF will likely require a gene therapy approach to correct CFTR. Rapid advancements in genome editing technologies such as CRISPR/Cas9 have already resulted in successful FDA approval for cell-based gene editing therapies, providing new therapeutic avenues for many rare diseases. However, immune responses to gene therapy delivery vectors and editing tools remain a challenge, especially for strategies targeting complex in vivo tissues such as the lung. Previous findings in non-CF healthy individuals reported pre-existing antibody and T cell dependent immune responses to recombinant Cas9 proteins, suggesting potential additional obstacles for incorporation of CRISPR/Cas9 technologies in gene therapies. To determine if pre-existing immunity to Cas9 from S. aureus or S. pyogenes was present or augmented in people with CF (PwCF), anti-Cas9 IgG levels and Cas9-specific T cell responses were determined from peripheral blood samples of PwCF and non-CF healthy controls. Overall, non-CF controls and PwCF displayed evidence of pre-existing antibody and T cell responses to both S. aureus and S. pyogenes Cas9, although there were no significant differences between the two populations. However, we observed global changes in activation of Th1 and CD8 T cell responses as measured by IFN{gamma} and TNF that warrant further investigation and mechanistic understanding as this finding has implications not only for CRISPR/Cas9 gene therapy for PwCF, but also for protection against infectious disease.

5

Genetic variants in DDX53 contribute to Autism Spectrum Disorder associated with the Xp22.11 locus

Scala, M.; Bradley, C. A.; Howe, J. L.; Trost, B.; Bautista Salazar, N.; Shum, C.; Reuter, M. S.; MacDonald, J. R.; Ko, S. Y.; Frankland, P. W.; Granger, L.; Anadiotis, G.; Pullano, V.; Brusco, A.; Keller, R.; Parisotto, S.; Pedro, H. F.; Lusk, L.; Pojomovsky McDonnell, P.; Helbig, I.; Mullegama, S. V.; Douine, E. D.; Russell, B. E.; Nelson, S. F.; Zara, F.; Scherer, S. W.

2023-12-27 genetic and genomic medicine 10.1101/2023.12.21.23300383 medRxiv

Top 0.1%

22.7%

Show abstract

Autism Spectrum Disorder (ASD) exhibits an [~]4:1 male-to-female sex bias and is characterized by early-onset impairment of social/communication skills, restricted interests, and stereotyped behaviors. Disruption of the Xp22.11 locus has been associated with ASD in males. This locus includes the three-exon PTCHD1 gene, an adjacent multi-isoform long noncoding RNA (lncRNA) named PTCHD1-AS (spanning [~]1Mb), and a poorly characterized single-exon RNA helicase named DDX53 that is intronic to PTCHD1-AS. While the relationship between PTCHD1/PTCHD1-AS and ASD is being studied, the role of DDX53 has not been examined, in part because there is no apparent functional murine orthologue. Through clinical testing, here, we identified 6 males and 1 female with ASD from 6 unrelated families carrying rare, predicted-damaging or loss-of-function variants in DDX53. Then, we examined databases, including the Autism Speaks MSSNG and Simons Foundation Autism Research Initiative, as well as population controls. We identified 24 additional individuals with ASD harboring rare, damaging DDX53 variations, including the same variants detected in two families from the original clinical analysis. In this extended cohort of 31 participants with ASD (28 male, 3 female), we identified 25 mostly maternally-inherited variations in DDX53, including 18 missense changes, 2 truncating variants, 2 in-frame variants, 2 deletions in the 3 UTR and 1 copy number deletion. Our findings in humans support a direct link between DDX53 and ASD, which will be important in clinical genetic testing. These same autism-related findings, coupled with the observation that a functional orthologous gene is not found in mouse, may also influence the design and interpretation of murine-modelling of ASD.

6

Noncoding de novo mutations in SCN2A are associated with autism spectrum disorders

Zhang, Y.; Ahsan, M. U.; Wang, K.

2024-05-06 genetic and genomic medicine 10.1101/2024.05.05.24306908 medRxiv

Top 0.1%

22.2%

Show abstract

Previous genetic studies in ASD identified hundreds of high-confidence ASD genes enriched with likely deleterious protein-coding de novo mutations (DNMs). Multiple studies also demonstrated that DNMs in the non-coding genome can contribute to ASD risk. However, identification of individual risk genes enriched with noncoding DNMs has remained largely unexplored. We analyzed two datasets with over 5000 ASD families to assess the contribution of noncoding DNMs. We used two methods to assess statistical significance for noncoding DNMs: a point-based test that analyzes sites that are likely functional, and a segment-based test that analyzes 1kb genomic segments with segment-specific background mutation rates. We found that coding and noncoding DNMs in SCA2A are associated with ASD risk. Further application of these approaches on large-scale whole genome sequencing data will aid in identifying additional candidates ASD risk genes.

7

Structure-based network analysis predicts mutations associated with inherited retinal disease

Hauser, B. M.; Luo, Y.; Nathan, A.; Gaiha, G. D.; Vavvas, D.; Comander, J.; Pierce, E. A.; Place, E. M.; Bujakowska, K. M.; Rossin, E. J.

2023-07-06 ophthalmology 10.1101/2023.07.05.23292247 medRxiv

Top 0.1%

18.3%

Show abstract

With continued advances in gene sequencing technologies comes the need to develop better tools to understand which mutations cause disease. Here we validate structure-based network analysis (SBNA)1, 2 in well-studied human proteins and report results of using SBNA to identify critical amino acids that may cause retinal disease if subject to missense mutation. We computed SBNA scores for genes with high-quality structural data, starting with validating the method using 4 well-studied human disease-associated proteins. We then analyzed 47 inherited retinal disease (IRD) genes. We compared SBNA scores to phenotype data from the ClinVar database and found a significant difference between benign and pathogenic mutations with respect to network score. Finally, we applied this approach to 65 patients at Massachusetts Eye and Ear (MEE) who were diagnosed with IRD but for whom no genetic cause was found. Multivariable logistic regression models built using SBNA scores for IRD-associated genes successfully predicted pathogenicity of novel mutations, allowing us to identify likely causative disease variants in 37 patients with IRD from our clinic. In conclusion, SBNA can be meaningfully applied to human proteins and may help predict mutations causative of IRD.

8

Targeted BRCA1/BRCA2 Sequencing in a Bangladeshi Clinically Referred Cohort Identifies Candidate BRCA1 Loss-of-Function Variants and a Multi-Exon Deletion-Like CNV Signal

Al Sium, S. M.; Banu, T. A.; Goswami, B.; Naser, S. R.; Habib, M. A.; Akter, S.; Ara, M. H.; Al Din, S. M. S.; Nafisa, A.; Nayem, M. R.; Rabbi, M. F. A.; Sarkar, M. M. H.; Khan, M. S.

2026-05-20 oncology 10.64898/2026.05.11.26352643 medRxiv

Top 0.1%

17.3%

Show abstract

Background: Population-relevant BRCA1/BRCA2 data from Bangladesh are scarce, creating challenges for hereditary breast and ovarian cancer variant interpretation, counseling, and follow-up testing. We examined a clinically referred Bangladeshi cohort to characterize assay-derived BRCA1/BRCA2 short variants, sequencing-depth performance, and copy-number findings in a conservative pilot framework. Methods: Twenty-three de-identified blood-derived DNA samples were assessed using a targeted BRCA1/BRCA2 next-generation sequencing workflow. Downstream analysis used assay-generated short-variant, coverage, and CNV outputs, with coordinates reported on hg19/GRCh37. Short variants were evaluated from high-confidence PASS/VCC-H calls, and CNV review incorporated both target-region and amplicon-level copy-number patterns. Results: After removal of four low-VAF review observations, the primary germline-compatible dataset comprised 304 short-variant observations representing 34 unique variants. Both BRCA1 and BRCA2 contributed comparable variant burdens, while the overall profile was mainly composed of missense and synonymous changes. Six sample-specific heterozygous BRCA1 truncating candidates were observed, including five frameshift variants and one stop-gain variant. Protein-level mapping placed these events across the central-to-C-terminal portion of BRCA1. Sequencing depth was consistently high across the targeted regions, with all 4,255 amplicon-sample measurements exceeding 280x and 99.91% reaching at least 500x. Copy-number analysis highlighted one candidate BRCA1 multi-exon deletion-like event involving exons 15-20 in BCSIR-BRCA-21, with unresolved partial exon 14 involvement. Conclusions: This study provides an initial Bangladesh-focused targeted BRCA1/BRCA2 dataset and identifies candidate short-variant and CNV findings for validation. These findings should be interpreted as analytical candidates only and require confirmatory testing and expert clinical curation before any clinical application. The cohort is referral-enriched and should not be used to infer population prevalence.

9

The Genetic Landscape and Epidemiological Characteristics of Inherited Retinal Diseases in the Chinese Population

Zeng, B.; Cui, Z.; Zhou, S.; Dai, W.

2026-05-29 ophthalmology 10.64898/2026.05.27.26354224 medRxiv

Top 0.1%

17.2%

Show abstract

Background: Inherited Retinal Diseases (IRDs) are a group of genetically heterogeneous blinding conditions. Major global genomic reference databases are disproportionately enriched for individuals of European ancestry. This underrepresentation creates a significant bias that impedes the accuracy of genetic diagnosis in the Chinese population. This study aims to address this limitation by constructing a comprehensive genetic landscape of IRDs using large-scale deep-sequencing data from a large Chinese cohort. Methods: The study leveraged variant data primarily from 10,588 individuals in the China Metabolic Analytics Project (ChinaMAP) and cross-referenced findings against multiple national and international databases. We systematically curated variants within a targeted panel of 291 IRD-associated genes. Variant pathogenicity was assessed using a comprehensive pipeline integrating InterVar-automated classification based on 2015 American College of Medical Genetics and Genomics/Association for Molecular Pathology (ACMG/AMP) guidelines, ClinVar evidence (review status [≥] 1 star), and manual literature curation. We delineated the mutational spectrum, identified population-enriched pathogenic/likely pathogenic (P/LP) variants, and analyzed the distribution characteristics of IRD-associated highly-mutated genes. Furthermore, we calculated the carrier frequencies (CF) and genetic prevalence (GP) of autosomal recessive(AR)-IRD genes in the Chinese population. Results: The study revealed a highly concentrated genetic landscape for AR-IRDs in the Chinese population, with ABCA4 and USH2A emerging as the primary drivers of the genetic burden. This finding aligns with previous Chinese cohorts but contrasts with global databases, where genes such as the X-linked RPGR are more prevalent. In contrast, autosomal dominant (AD)-IRDs exhibited high locus heterogeneity, with pathogenic variants dispersed across numerous genes (e.g., COL2A1 and MFN2). We identified a series of P/LP variants that were either high-frequency or significantly enriched in the Chinese population, such as CNGB1 (p.P530R) and specific recurrent alleles in ABCA4 and CYP4V2. The estimated cumulative CF for AR-IRDs was 1 in 5.60, and the theoretical total GP was 1 in 2,624.67, based on the ChinaMAP data. Conclusion: By integrating the ChinaMAP dataset with diverse genomic resources, this study provides a genetic landscape of IRDs in the Chinese population. Our analysis shows a concentrated mutational spectrum in AR-IRDs, contrasting with the pronounced heterogeneity in AD-IRDs. These findings, including population-specific pathogenic variants and refined prevalence estimates, provide a resource for precision diagnostics, genetic counseling, expanded carrier screening (ECS), and public health policy development in China.

10

The role of alternative splicing in CEP290-related disease pathogenesis

Taylor, R. D.; Poulter, J. A.; Cockburn, J.; Ladbury, J. E.; Peckham, M.; Johnson, C. A.

2022-03-04 genetic and genomic medicine 10.1101/2022.03.03.22271834 medRxiv

Top 0.1%

17.1%

Show abstract

Primary ciliopathies are a group of inherited developmental disorders resulting from defects in the primary cilium. Mutations in CEP290 (Centrosomal protein of 290kDa) are the most frequent cause of recessive ciliopathies (incidence up to 1:15,000). Pathogenic variants span the full length of this large (93.2kb) 54 exon gene, causing phenotypes ranging from isolated inherited retinal dystrophies (IRDs; Leber Congenital Amaurosis, LCA) to a pleiotropic range of severe syndromic multi-organ ciliopathies affecting retina, kidney and brain. Most pathogenic CEP290 variants are predicted null (37% nonsense, 42% frameshift), but there is no clear genotype-phenotype association. Almost half (26/53) of the coding exons in CEP290 are in-phase "skiptic" (or skippable) exons. Variants located in skiptic exons could be removed from CEP290 transcripts by skipping the exon, and nonsense-associated altered splicing (NAS) has been proposed as a mechanism that attenuates the pathogenicity of nonsense or frameshift CEP290 variants. Here, we have used in silico bioinformatic techniques to study the propensity of CEP290 skiptic exons for NAS. We then used CRISPR-Cas9 technology to model CEP290 frameshift mutations in induced pluripotent stem cells (iPSCs) and analysed their effects on splicing and ciliogenesis. We identified exon 36, a hotspot for LCA mutations, as a strong candidate for NAS that we confirmed in mutant iPSCs that exhibited sequence-specific exon skipping. Exon 36 skipping did not affect ciliogenesis, in contrast to a larger frameshift mutant that significantly decreased cilia size and incidence in iPSCs. We suggest that sequence-specific NAS provides the molecular basis of genetic pleiotropy for CEP290-related disorders.

11

AutScore- An integrative scoring approach for prioritization of ultra-rare autism spectrum disorder candidate variants from whole exome sequencing data

Shil, A.; Arava, N.; Levi, N.; Levine, L.; Golan, H.; Meiri, G.; Michaelovski, A.; Tsadaka, Y.; Aran, A.; Menashe, I.

2024-01-25 genetic and genomic medicine 10.1101/2024.01.24.24301544 medRxiv

Top 0.1%

15.2%

Show abstract

BackgroundDiscerning clinically relevant ASD candidate variants from whole-exome sequencing (WES) data is complex, time-consuming, and labor-intensive. To this end, we developed AutScore, an integrative prioritization algorithm of ASD candidate variants from WES data, and assessed its performance to detect clinically relevant variants. MethodsWe studied WES data from 581 ASD probands, and their parents registered in the Azrieli National Center database for Autism and Neurodevelopment Research. We focused on rare allele frequency <1%), high-quality proband-specific variants affecting genes associated with ASD or other neurodevelopmental disorders (NDDs). We assigned a score (i.e., AutScore) to each such variant based on their pathogenicity, clinical relevance, gene-disease association, and inheritance patterns. Finally, we compared the AutScore performance with the rating of clinical experts and the NDD variants prioritization algorithm, AutoCasC. ResultsOverall, 1161 ultra-rare variants distributed in 687 genes in 441 ASD probands were evaluated by AutScore with scores ranging from -4 to 25, with a mean {+/-} SD of 5.89 {+/-} 4.18. AutScore cut-off of [≥] 12 outperforms AutoCasC in detecting clinically relevant ASD variants, with a detection accuracy rate of 72.3% and an overall diagnostic yield of 11.9%. Sixteen variants with AutScore of [≥] 12 were distributed in fifteen novel ASD genes. ConclusionAutScore is an effective automated ranking system for ASD candidate variants that could be implemented in ASD clinical genetics pipelines.

12

Policies, practices, and experiences of European biobanks on sharing genomic biobank results with donors - a survey of BBMRI-ERIC biobanks

Brunfeldt, M.; Vrijenhoek, T.; Kaariainen, H.

2025-09-27 public and global health 10.1101/2025.09.25.25336629 medRxiv

Top 0.1%

15.0%

Show abstract

To study European biobanks policies, practices, and experiences on communicating individual research results to participants the EU Horizon 2020 Project Genetics Clinic of the Future performed two surveys in 2016 and 2020. First, a questionnaire was sent in 2016 (Survey I) to 351 European biobanks in 13 countries that were members of Biobanking and Biomolecular Resources Research Infrastructure - European Research Infrastructure Consortium (BBMRI-ERIC). We received replies from 72 biobanks (response rate 21%), representing each of the 13 BBMRI Member States. Respondents were mainly directors or heads of biobanks. To evaluate how the policies and practices of biobanks evolved over time, we also conducted another survey in 2020 (Survey II). The Survey I was implemented using a web based Webropol tool, and the Survey II was distributed by email. The biobanks had very different policies of sharing genomic data and the policies had changed over time. The percentage of biobanks with a policy to share results with participants if they so wish had increased between 2016-2020 from 36% to 45%. On the contrary, the percentage of biobanks with a policy to pro-actively re-contact the participants to share (some) results had decreased from 52% to 39%. Still in 2020, half of the biobanks had never shared results with participants.

13

Performance of LFSPRO TP53 germline carrier risk predictions compared to standard genetic counseling practice on prospectively collected probands

Corredor, J. L.; Dodd-Eaton, E. B.; Woodman-Ross, J.; Woodson, A.; Nguyen, N. H.; Peng, G.; Green, S.; Gutierrez, A. M.; Arun, B. K.; Wang, W.

2024-07-10 oncology 10.1101/2024.07.09.24310095 medRxiv

Top 0.1%

14.8%

Show abstract

Genetic counseling and testing for germline mutations are essential for identifying individuals at increased risk for cancer. Pathogenic variants in TP53 are diagnostic of Li-Fraumeni syndrome (LFS), a highly penetrant disorder with diverse, early-onset tumors. Current clinical guidelines, such as Chompret and Classic criteria, provide frameworks for identifying individuals at risk for likely pathogenic/pathogenic TP53 variants; however, genetic counselors often encounter patients with features concerning for LFS that do not clearly meet established criteria, creating challenges for risk assessment and testing decisions. We evaluated whether LFSPRO, a Mendelian, family-history-based model that estimates the individuals probability of harboring a deleterious TP53 variant, improves carrier identification relative to guideline criteria. In a prospectively collected cohort of 182 probands who underwent clinical genetic counseling and germline TP53 testing, LFSPRO showed superior discrimination compared with Chompret criteria, with higher sensitivity (81% vs. 33%) and specificity (88% vs. 65%) and improved predictive values (PPV 0.53 vs. 0.14; NPV 0.96 vs. 0.85). Receiver operating characteristic analysis confirmed strong discriminatory performance (AUC=0.88). Calibration analysis using observed-to-expected ratios indicated good agreement between predicted and observed carrier frequencies (Observed/Expected=1.07). These findings demonstrate that LFSPRO outperforms traditional guideline-based criteria for identifying TP53 mutation carriers in real-world clinical settings. By providing quantitative, well-calibrated carrier probabilities rather than binary classifications, LFSPRO can enhance genetic counseling and support testing decisions, particularly for individuals who do not clearly meet existing criteria.

14

Is 7p14.1 an orofacial cleft risk locus? Genome-wide study of copy number variation in multiple populations provides both a replication of previous studies and an alternative explanation

Mukhopadhyay, N.; Feingold, E. E.; Brand, H.; Lee, M. K.; Kurtas, E. N.; Sanchis-Juan, A.; Moreno-Uribe, L.; Wehby, G.; Valencia-Ramirez, L. C.; Restrepo Muneton, C. P.; Padilla, C.; Deleyiannis, F.; Poletta, F. A.; Orioli, I. M.; Hecht, J. T.; Buxo, C. J.; Butali, A.; Adeyemo, W. L.; Abebe, M. E.; Vieira, A. R.; Shaffer, J. R.; Murray, J. C.; Weinberg, S. M.; Ruczinski, I.; Leslie-Clarkson, E. J.; Marazita, M. L.

2026-01-15 epidemiology 10.64898/2026.01.09.26343782 medRxiv

Top 0.1%

14.7%

Show abstract

ObjectiveOur understanding of the genetic causes of non-syndromic orofacial clefts (OFCs) is based largely upon genetic studies of common and rare nucleotide variants. Less is known about the role of copy number variations (CNVs) and the studies published to date have been limited to either small samples or targeted genomic regions. The objective of our study is to investigate the contribution of CNVs spread across the entire genome to OFC risk in a large multi-ancestry cohort. MethodsWe utilized PennCNV on microarray genotyping data to detect CNVs in 10,240 participants (2,484 with clefts, 7,756 unaffected). 70,695 quality-filtered autosomal CNVs (49,660 deletions, 21,035 duplications) were used to assign normal/abnormal copy number statuses at 67,199 positions from the GRCh37 genome assembly. Genome-wide association was run between cleft status and copy number status. ResultsWe observed a highly significant association between OFCs and deletions on chromosome 7p14.1 (p=1.32e-35) driven by Central and South American ancestry (p=1.04e-25) participants, with less significant contributions from European (p=3.37e-08) and Asian (p=0.01) ancestry participants. We also observed four other loci with p-values below 10e-04. ConclusionThe 7p14.1 association observed in our study is a replication of two prior studies in independent cohorts of European ancestry. However, this locus lies in a T-cell receptor region that is subject to somatic rearrangements that decrease in frequency with age and may affect genetic association results. Our data show age effects as well as differences between blood and saliva samples. Thus, our results can be interpreted either as supporting a previously established association with orofacial clefts, or as questioning those previous results in favor of a hypothesis about the behavior of somatic rearrangements in T-cell receptor regions.

15

Mapping structural variants to rare disease genes using long-read whole genome sequencing and trait-relevant polygenic scores

LeMaster, C.; Schwendinger-Schreck, C.; Ge, B.; Cheung, W.; Johnston, J.; Pastinen, T.; Smail, C.

2024-03-18 genetic and genomic medicine 10.1101/2024.03.15.24304216 medRxiv

Top 0.1%

14.5%

Show abstract

Recent studies have revealed the pervasive landscape of rare structural variants (rSVs) present in human genomes. rSVs can have extreme effects on the expression of proximal genes and, in a rare disease context, have been implicated in patient cases where no diagnostic single nucleotide variant (SNV) was found. Approaches for integrating rSVs to date have focused on targeted approaches in known Mendelian rare disease genes. This approach is intractable for rare diseases with many causal loci or patients with complex, multi-phenotype syndromes. We hypothesized that integrating trait-relevant polygenic scores (PGS) would provide a substantial reduction in the number of candidate disease genes in which to assess rSV effects. We further implemented a method for ranking PGS genes to define a set of core/key genes where a rSV has the potential to exert relatively larger effects on disease risk. Among a subset of patients enrolled in the Genomic Answers for Kids (GA4K) rare disease program (N=497), we used PacBio HiFi long-read whole genome sequencing (lrWGS) to identify rSVs intersecting genes in trait-relevant PGSs. Illustrating our approach in Autism (N=54 cases), we identified 22,019 deletions, 2,041 duplications, 87,826 insertions, and 214 inversions overlapping putative core/key PGS genes. Additionally, by integrating genomic constraint annotations from gnomAD, we observed that rare duplications overlapping putative core/key PGS genes were frequently in higher constraint regions compared to controls (P = 1x10-03). This difference was not observed in the lowest-ranked gene set (P = 0.15). Overall, our study provides a framework for the annotation of long-read rSVs from lrWGS data and prioritization of disease-linked genomic regions for downstream functional validation of rSV impacts. To enable reuse by other researchers, we have made SV allele frequencies and gene associations freely available.

16

In-Silico Characterization of TP53 Splice Mutations in Somatic and Germline Tumours

Bhandarkar, A. A.; Kelly-Foleni, N. E.; Sarkar, D.; Jeffs, A.; Slatter, T.; Braithwaite, A.; Mehta, S.

2025-06-01 cancer biology 10.1101/2025.05.27.656522 medRxiv

Top 0.1%

14.5%

Show abstract

TP53 undergoes alternative splicing to produce multiple mRNA transcripts and protein isoforms, yet the effects of splice site mutations on isoform regulation, tumor-biology, and clinical outcome remain unclear. Analysis of 23,017 TP53 variants, including 18,562 somatic mutations (pan-cancer datasets - cBioPortal) and 4,455 germline variants (IARC database), identified recurrent donor (X32, X125, X224, X261, X331) and acceptor (X33, X126, X187, X225, X307, X332) splice site mutations. Germline variants showed nucleotide-specific transition biases. Most splice site mutations were associated with reduced TP53 mRNA expression; however, X32, X33, X126, and X261 maintained or elevated transcript levels. Splice mutations were associated with distinct transcriptional subsets marked by altered p53 target gene expression, elevated tumor mutation burden, increased genomic instability, and significantly reduced disease-free survival compared to missense mutations, with X126 and X331 being associated with poorest outcomes. These findings emphasize the clinical impact of TP53 splice site mutations and the need for functional classification.

17

Evaluating the impact of compound heterozygosity involving microdeletions and sequence-level variants: findings in autism

Engchuan, W.; Han, K.; Feitosa, R. M.; Salazar, N. B.; Mager, D. J.; Wu, S.; Ali, F.; Chan, A.; Mendes de Aquino, M.; Zhou, X.; Shaath, R.; Safarian, N.; Thiruvahindrapuram, B.; Nalpathamkalam, T.; Pellecchia, G.; de Rijke, J.; Zarrei, M.; Breetvelt, E.; Scherer, S. W.; Trost, B.; Vorstman, J.

2025-10-21 genetic and genomic medicine 10.1101/2025.10.17.25338215 medRxiv

Top 0.1%

14.4%

Show abstract

Compound heterozygous events involving a chromosome deletion and on the remaining allele a functional DNA sequence-level variant can underpin a range of medical conditions. Most large-scale genetic studies do not include a systematic analysis of such compound heterozygous deletion (DelCH) events. We developed three frameworks: i) traditional burden analysis; ii) deletion-matched burden analysis; and iii) transmission disequilibrium test (TDT), to examine the possible contribution of DelCH to clinical presentations, and report results of their implementation in 9,766 families of autistic individuals. Across the three strategies, we observed enrichment of rare DelCH events in autistic individuals at a nominal significance level for individual tests. Collectively, six genes; CFHR4, HSDL1, MYO15A, NEFH, and three olfactory receptor genes; OR1A2, OR4P2, were affected by DelCH events in at least two unrelated autistic individuals (and not in unaffected family members), while the reverse analyses identified no genes (p<2.2 x 10-16). Gene set enrichment analysis of the extended network of candidate genes showing a remarkable convergence to processes related to neurogenesis. Our findings suggest a modest role for DelCH events in ASD. The strategies described here are available via a GitHub repository, allowing the research community to examine the role of DelCH in other genome sequencing cohorts.

18

TP53 minigene analysis of 161 sequence changes provides evidence for role of spatial constraint and regulatory elements on variant-induced splicing impact

Canson, D. M.; Llinares-Burguet, I.; Fortuno, C.; Sanoguera-Miralles, L.; Bueno-Martinez, E.; de la Hoya, M.; Spurdle, A. B.; Velasco-Sampedro, E. A.

2024-10-11 genetics 10.1101/2024.10.07.617118 medRxiv

Top 0.1%

14.2%

Show abstract

Germline TP53 genetic variants that disrupt splicing are implicated in hereditary cancer predisposition, while somatic variants contribute to tumorigenesis. We investigated the role of TP53 splicing regulatory elements (SREs), including G-runs that act as intronic splicing enhancers, using exons 3 and 6 and their downstream introns as models. Minigene microdeletion assays revealed four SRE-rich intervals: c.573_598, c.618_641, c.653_669 and c.672+14_672+36. A diagnostically reported deletion c.655_670del, overlapping an SRE-rich interval, induced an in-frame transcript {Delta}(E6q21) from new donor site usage. Within intron 6, deletion of at least four G-runs led to 100% aberrant transcript expression. Additionally, assay results suggested a donor-to-branchpoint distance cutoff of <50 nt for complete splicing aberration due to spatial constraint, and >75 nt for low risk of splicing abnormality. Overall, splicing data for 134 single nucleotide variants (SNVs) and 27 deletions in TP53 demonstrated that SRE-disrupting SNVs have weak splicing impact (up to 26% exon skipping), while deletions spanning multiple SREs can have profound splicing effects. Results also provide more data to inform splicing impact prediction for intronic deletions that shorten intron size.

19

Contribution of Rare Large High-Penetrance CNVs to Pediatric and MODY Diabetes in Norway

Artaza, H.; Priyanka, D.; Molnes, J.; Lavrichenko, K.; Wolff, A. S. B.; Royrvik, E. C.; Skrivarhaug, T.; Vaudel, M.; Bratland, E.; Johansson, B.; Njolstad, P. R.; Johansson, S.

2025-02-28 genetic and genomic medicine 10.1101/2025.02.25.25322584 medRxiv

Top 0.1%

12.6%

Show abstract

Technological advancements have significantly improved our understanding of Copy Number Variants (CNVs) and their role in disease. However, detecting CNVs in clinical diagnostics remains challenging, and important pathogenic CNVs may go undetected. This study systematically assessed the impact of rare, large, high-penetrance CNVs on pediatric diabetes and Maturity-onset Diabetes of the Young (MODY) in Norway. We analyzed data from the nationwide Norwegian Childhood Diabetes Registry (NCDR) covering 2002-2018 and the Norwegian MODY Registry (NMR) from 1997-2019. CNV detection was performed using the Illumina Infinium Global Screen Array-24 v2.0 on a total of 5,889 individuals and we compared the results to diagnostic records. Our findings indicate that 0.63% of the patients in the Norwegian MODY Registry and 0.09% in the Norwegian Childhood Diabetes Registry are attributable to established pathogenic large copy number deletions detectable by array genotyping. Notably, six of the 14 pathogenic deletions identified (in the HNF1B [n=3], HNF1A [n=2], or GATA4 genes [n=1]) had not been detected through standard diagnostic methods in the routine diagnostic screening. For these individuals, accurate molecular diagnoses have significant implications for personalized treatment and follow-up. We found no evidence suggesting a major role for additional rare CNVs beyond the already established pathogenic CNVs in MODY. In conclusion, while pathogenic CNVs are rare, they remain relevant for patients of the Norwegian nationwide diabetes registries. Expanding screening for MODY variants, specifically 17q12-HNF1B and HNF1A deletions, to a larger portion of the pediatric diabetes population should be considered.

20

Biallelic loss of function variants in WBP4, encoding a spliceosome protein, result in a variable neurodevelopmental delay syndrome

Engal, E.; Oja, K. T.; Maroofian, R.; Geminder, O.; Le, T.-L.; Mor, E.; Tzvi, N.; Elefant, N.; Zaki, M. S.; Gleeson, J. G.; Muru, K.; Pajusalu, S.; Wojcik, M. H.; Pachat, D.; Abd Elmaksoud, M.; Jeong, W. C.; Lee, H.; Bauer, P.; Zifarelli, G.; Houlden, H.; Elpeleg, O.; Gordon, C.; Harel, T.; Ounap, K.; Salton, M.; Mor-Shaked, H.

2023-06-27 genetic and genomic medicine 10.1101/2023.06.19.23291425 medRxiv

Top 0.1%

12.5%

Show abstract

Over two dozen spliceosome proteins are involved in human diseases, also referred to as spliceosomopathies. WBP4 (WW Domain Binding Protein 4) is part of the early spliceosomal complex, and was not described before in the context of human pathologies. Ascertained through GeneMatcher we identified eleven patients from eight families, with a severe neurodevelopmental syndrome with variable manifestations. Clinical manifestations included hypotonia, global developmental delay, severe intellectual disability, brain abnormalities, musculoskeletal and gastrointestinal abnormalities. Genetic analysis revealed overall five different homozygous loss-of-function variants in WBP4. Immunoblotting on fibroblasts from two affected individuals with different genetic variants demonstrated complete loss of protein, and RNA sequencing analysis uncovered shared abnormal splicing patterns, including enrichment for abnormalities of the nervous system and musculoskeletal system genes, suggesting that the overlapping differentially spliced genes are related to the common phenotypes of the probands. We conclude that biallelic variants in WBP4 cause a spliceosomopathy. Further functional studies are called for better understanding of the mechanism of pathogenicity.